200311225-1 



CLAIMS 

What is claimed is: 

1 . A method for persistently tracking volatile memory faults, the method 
comprising: 

detecting a memory error relating to at least one dynamic random access 
memory (DRAM) unit on a particular memory module; and 

writing an entry pertaining to the memory error in non-volatile memory of 
an fault storage unit on that particular memory module. 

2. The method of claim 1 , wherein the particular memory module comprises 
a particular dual in-line memory module (DIMM) of a plurality of DIMMs in 
a memory system. 

3. The method of claim 1 , further comprising: 
determining a scope of the detected memory error. 

4. The method of claim 3, wherein the scope of the memory error is 
determined by a logical analysis of a history of faults associated with the 
particular memory module. 

5. The method of claim 1 , wherein the entry comprises a DRAM unit 
identifier, a low bit number of a range, a high bit number of the range, and 
tag bits indicating time of last failure and number of occurrences of failure. 

6. The method of claim 1 , further comprising: 

reading the entry from the non-volatile memory of the fault storage unit; 
and 

removing memory bits associated with the memory error from a set of 
usable memory. 

7. The method of claim 1 , further comprising: 
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removing memory bits associated with the memory error from a set of 
usable memory while the particular memory module remains 
online. 

8. A memory module that persistently tracks volatile memory faults, the 
memory module comprising: 

a plurality of dynamic random access memories (DRAMs); and 

a fault storage unit including non-volatile memory configured to store 

entries pertaining to faults in the plurality of DRAMs on that 

memory module. 

9. The memory module of claim 8, further comprising: 

interface circuitry configured to provide read and write access by a 

memory error interface unit on a circuit board to the non-volatile 
memory of the fault storage unit. 

10. The memory module of claim 8, wherein an entry stored in the non- 
volatile memory of the fault storage unit includes a DRAM identifier and a 
range of bits. 

1 1 . The memory module of claim 8, wherein the memory module comprises a 
dual in-line memory module (DIMM). 

12. A circuit board of a system, the circuit board comprising: 

a plurality of connectors, each connector configured to connect to a 

memory module which includes multiple volatile memory units and 

a non-volatile fault storage unit; 
a memory controller configured to read and write data into the volatile 

memory units of memory modules; and 
a memory error interface configured to provide read and write access to 

the non-volatile fault storage units of the memory modules. 

1 3. The circuit board of claim 12, further comprising: 
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a processor dependent hardware (PDH) interface communicatively 

coupled between a central processing unit and the memory error 
interface. 

14. The circuit board of claim 1 3, further comprising: 

a processor dependent code (PDC) unit accessible via the PDH interface, 
wherein the PDC unit includes boot code and error handling code. 

15. The circuit board of claim 14, wherein the boot code includes instructions 
to read the entries from the non-volatile fault storage unit and to remove 
memory bits associated with the entries from a set of usable memory. 

16. The circuit board of claim 14, wherein the error handling code includes 
instructions to write entries relating to detected memory errors into the 
non-volatile fault storage unit and to read said entries from the non- 
volatile fault storage unit. 

17. The circuit board of claim 12, wherein the volatile memory units comprise 
dynamic random access memory, and wherein the plurality of memory 
modules comprise dual in-line memory modules (DIMMs). 

18. A memory system comprising: 

means for reading data from and writing data to volatile memory units on 
a plurality of memory modules; and 

means for reading error entries from and writing error entries to a non- 
volatile fault storage unit on each memory module. 



